Skip to content

fix: replace manual anomalies with a hampel filter#1997

Open
matthewp wants to merge 6 commits intonpmx-dev:mainfrom
matthewp:feat/hampel-anomaly-detection
Open

fix: replace manual anomalies with a hampel filter#1997
matthewp wants to merge 6 commits intonpmx-dev:mainfrom
matthewp:feat/hampel-anomaly-detection

Conversation

@matthewp
Copy link

@matthewp matthewp commented Mar 8, 2026

🔗 Linked issue

Previous issue: #1707

🧭 Context

The current implementation of anomaly removal is biased due to the manual nature.

This replaces the implementation with one that uses a hampel filter to automatically remove deviations.

📚 Description

The important part here is applyHampelCorrection.

It goes over each data point and creates a sliding window of data points, by default 3 on each side.

Of those data points it finds the median, since a spike can't pull it off center.

Then it measures how spread out the neighbors are by calculating the MAD (see here).

Then it gives the data point a score and if it's above a threshold then it gets replaced with the median of that window and marked as being an anomaly.

Here are two articles about how hampel filters work:

Here's a screenshot of Vite which still shows its spike removed after this change.

Screenshot from 2026-03-08 17-24-36

@vercel
Copy link

vercel bot commented Mar 8, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
npmx.dev Ready Ready Preview, Comment Mar 11, 2026 10:45pm
2 Skipped Deployments
Project Deployment Actions Updated (UTC)
docs.npmx.dev Ignored Ignored Preview Mar 11, 2026 10:45pm
npmx-lunaria Ignored Ignored Mar 11, 2026 10:45pm

Request Review

@matthewp matthewp changed the title Replace manual anomalies with a hampel filter fix: replace manual anomalies with a hampel filter Mar 8, 2026
@codecov
Copy link

codecov bot commented Mar 8, 2026

Codecov Report

❌ Patch coverage is 31.11111% with 31 lines in your changes missing coverage. Please review.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
app/utils/download-anomalies.ts 20.00% 20 Missing and 4 partials ⚠️
app/components/Package/TrendsChart.vue 46.15% 1 Missing and 6 partials ⚠️

📢 Thoughts on this report? Let us know!

@github-actions
Copy link

github-actions bot commented Mar 8, 2026

Lunaria Status Overview

🌕 This pull request will trigger status changes.

Learn more

By default, every PR changing files present in the Lunaria configuration's files property will be considered and trigger status changes accordingly.

You can change this by adding one of the keywords present in the ignoreKeywords property in your Lunaria configuration file in the PR's title (ignoring all files) or by including a tracker directive in the merged commit's description.

Tracked Files

File Note
i18n/locales/az-AZ.json Localization changed, will be marked as complete. 🔄️
i18n/locales/bg-BG.json Localization changed, will be marked as complete. 🔄️
i18n/locales/cs-CZ.json Localization changed, will be marked as complete. 🔄️
i18n/locales/de-DE.json Localization changed, will be marked as complete. 🔄️
i18n/locales/en.json Source changed, localizations will be marked as outdated.
i18n/locales/es.json Localization changed, will be marked as complete. 🔄️
i18n/locales/fr-FR.json Localization changed, will be marked as complete. 🔄️
i18n/locales/hu-HU.json Localization changed, will be marked as complete. 🔄️
i18n/locales/id-ID.json Localization changed, will be marked as complete. 🔄️
i18n/locales/ja-JP.json Localization changed, will be marked as complete. 🔄️
i18n/locales/pl-PL.json Localization changed, will be marked as complete. 🔄️
i18n/locales/ru-RU.json Localization changed, will be marked as complete. 🔄️
i18n/locales/tr-TR.json Localization changed, will be marked as complete. 🔄️
i18n/locales/uk-UA.json Localization changed, will be marked as complete. 🔄️
i18n/locales/zh-CN.json Localization changed, will be marked as complete. 🔄️
i18n/locales/zh-TW.json Localization changed, will be marked as complete. 🔄️
Warnings reference
Icon Description
🔄️ The source for this localization has been updated since the creation of this pull request, make sure all changes in the source have been applied.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 8, 2026

📝 Walkthrough

Walkthrough

This PR replaces blocklist-based anomaly correction with a Hampel-filter implementation (adds applyHampelCorrection) and removes the DOWNLOAD_ANOMALIES dataset and blocklist helpers. Chart components (TrendsChart, WeeklyDownloadStats) now call the Hampel correction and UI/state for per-package anomalies (per-package anomaly lists, hasAnomalies, range formatting, contribute links, interactive tooltips) were removed or simplified. New Hampel controls (hampelWindow, hampelThreshold) were added to settings and i18n; many locale keys for known anomaly ranges were deleted. Tests were updated to validate applyHampelCorrection.

Possibly related PRs

Suggested reviewers

  • graphieros
  • danielroe
🚥 Pre-merge checks | ✅ 1
✅ Passed checks (1 passed)
Check name Status Explanation
Description check ✅ Passed The PR description clearly explains the transition from manual anomaly removal to a Hampel filter-based approach, including the algorithm details and references.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3


ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: beb9630b-e84a-4279-a4f8-362d18f451b4

📥 Commits

Reviewing files that changed from the base of the PR and between d26e250 and 1b0a0c7.

📒 Files selected for processing (5)
  • app/components/Package/TrendsChart.vue
  • app/components/Package/WeeklyDownloadStats.vue
  • app/utils/download-anomalies.data.ts
  • app/utils/download-anomalies.ts
  • test/unit/app/utils/download-anomalies.spec.ts
💤 Files with no reviewable changes (1)
  • app/utils/download-anomalies.data.ts

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
app/components/Package/TrendsChart.vue (1)

1831-1838: Consider using v-model for simpler checkbox binding.

The explicit :checked + @change pattern is functionally correct, but v-model provides equivalent behaviour with less boilerplate.

♻️ Optional simplification
 <input
-  :checked="settings.chartFilter.anomaliesFixed"
-  `@change`="
-    settings.chartFilter.anomaliesFixed = ($event.target as HTMLInputElement).checked
-  "
+  v-model="settings.chartFilter.anomaliesFixed"
   type="checkbox"
   class="accent-[var(--accent-color,var(--fg-subtle))]"
 />

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 91971e88-96ef-48e4-befd-3a0865a5eb4c

📥 Commits

Reviewing files that changed from the base of the PR and between 1b0a0c7 and 17e63ce.

📒 Files selected for processing (17)
  • app/components/Package/TrendsChart.vue
  • i18n/locales/bg-BG.json
  • i18n/locales/cs-CZ.json
  • i18n/locales/de-DE.json
  • i18n/locales/en.json
  • i18n/locales/es.json
  • i18n/locales/fr-FR.json
  • i18n/locales/hu-HU.json
  • i18n/locales/id-ID.json
  • i18n/locales/ja-JP.json
  • i18n/locales/pl-PL.json
  • i18n/locales/ru-RU.json
  • i18n/locales/tr-TR.json
  • i18n/locales/uk-UA.json
  • i18n/locales/zh-CN.json
  • i18n/locales/zh-TW.json
  • i18n/schema.json
💤 Files with no reviewable changes (16)
  • i18n/locales/en.json
  • i18n/locales/bg-BG.json
  • i18n/locales/es.json
  • i18n/locales/de-DE.json
  • i18n/locales/zh-TW.json
  • i18n/locales/fr-FR.json
  • i18n/locales/tr-TR.json
  • i18n/locales/cs-CZ.json
  • i18n/locales/hu-HU.json
  • i18n/schema.json
  • i18n/locales/zh-CN.json
  • i18n/locales/uk-UA.json
  • i18n/locales/ru-RU.json
  • i18n/locales/ja-JP.json
  • i18n/locales/pl-PL.json
  • i18n/locales/id-ID.json

@danielroe
Copy link
Member

danielroe commented Mar 8, 2026

this looks very promising! tagging @jycouet who may have some thoughts on this 🙏

@jycouet
Copy link
Contributor

jycouet commented Mar 9, 2026

Great to have a second look into this.

You probably checked my initial PR, #1636 I started with hampel implementation 👍

If I remember correctly, the sweet spot was 2.5 or 3 for vite. But this setting would hide "great start" of a lib. (eg: 0 0 0 0 0 0 20000)
I will pull the branch later (I'm on phone atm) as I'm curious to see it, maybe you have a more robust implem' 👍

Another note, when the start or end of the chart is in an anomalie period, any algo can't fix it. A good example is: if you start the chart in the middle of the vite spike.

Maybe we should add some tests around all this ? (to keep the intent)

Another note 2: I would love to have more than just vite there today ! 😅

@graphieros
Copy link
Contributor

Another note 2: I would love to have more than just vite there today ! 😅

@jycouet corrections were applied to Svelte in #1934 & #1983

@jycouet
Copy link
Contributor

jycouet commented Mar 9, 2026

Another note 2: I would love to have more than just vite there today ! 😅

@jycouet corrections were applied to Svelte in #1934 & #1983

That's the drawback of answering on phone
It shows my age 😅

@jycouet
Copy link
Contributor

jycouet commented Mar 9, 2026

Thanks for this! A few thoughts after pulling the branch:

Reproduction URL that matters:
http://127.0.0.1:3000/package/vite?modal=chart&start=2025-03-09&end=2026-03-07
(because I suspect there's also a day-of-the-week issue atm, see #2005 let's not speak about this here)

"Great start" test cases:

Packages like @bramus/specificity or @sveltejs/sv-utils are good real-world examples of the 0 0 0 0 0 → lots of downloads pattern.
Would be great to add these as test cases (or manual verification) to see how Hampel manages them (or to find good defaults)

PR suggestion

Could we expose the Hampel filter as its own (separate from manual correction), with tweakable halfWindow / threshold sliders? That way we can compare manual vs Hampel side by side and find the right balance before committing to one approach?
Let me know if you want me to help or if you want to do it?

// Happy coding

@matthewp
Copy link
Author

matthewp commented Mar 9, 2026

Could we expose the Hampel filter as its own (separate from manual correction), with tweakable halfWindow / threshold sliders? That way we can compare manual vs Hampel side by side and find the right balance before committing to one approach?
Let me know if you want me to help or if you want to do it?

I don't follow, as its own what?

@jycouet
Copy link
Contributor

jycouet commented Mar 9, 2026

own

Something like
image

It's own controls. (It's more to be able to tests different senarios)

@matthewp
Copy link
Author

matthewp commented Mar 9, 2026

Yes I can do that.

Skip points without a full symmetric window to avoid flattening real
growth at series edges ("great start" problem). Use a relative check
when MAD is 0 instead of flagging any deviation, so sparse packages
like [0,0,0,1,0,0,0] keep their real activity.
Expose halfWindow and threshold as sliders on a second row in the
data correction panel so users can tune the filter interactively.
@matthewp matthewp force-pushed the feat/hampel-anomaly-detection branch from 17e63ce to 88334f4 Compare March 11, 2026 22:41
@matthewp
Copy link
Author

@jycouet I added sliders and addressed your concerns.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
i18n/locales/en.json (1)

469-477: ⚠️ Potential issue | 🟡 Minor

Rename this section away from “Known anomalies”.

These strings now sit beside Hampel window/threshold controls, but they still describe a curated set of bot/CI spikes. The PR now applies automatic statistical filtering instead, so this wording misrepresents what the toggle does and how the values are changed.


ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 848ad525-54b8-48c1-961d-cda990fce395

📥 Commits

Reviewing files that changed from the base of the PR and between 17e63ce and bfc1bcb.

📒 Files selected for processing (23)
  • app/components/Package/TrendsChart.vue
  • app/components/Package/WeeklyDownloadStats.vue
  • app/composables/useSettings.ts
  • app/utils/download-anomalies.data.ts
  • app/utils/download-anomalies.ts
  • i18n/locales/az-AZ.json
  • i18n/locales/bg-BG.json
  • i18n/locales/cs-CZ.json
  • i18n/locales/de-DE.json
  • i18n/locales/en.json
  • i18n/locales/es.json
  • i18n/locales/fr-FR.json
  • i18n/locales/hu-HU.json
  • i18n/locales/id-ID.json
  • i18n/locales/ja-JP.json
  • i18n/locales/pl-PL.json
  • i18n/locales/ru-RU.json
  • i18n/locales/tr-TR.json
  • i18n/locales/uk-UA.json
  • i18n/locales/zh-CN.json
  • i18n/locales/zh-TW.json
  • i18n/schema.json
  • test/unit/app/utils/download-anomalies.spec.ts
💤 Files with no reviewable changes (15)
  • i18n/locales/ja-JP.json
  • i18n/locales/zh-TW.json
  • i18n/locales/fr-FR.json
  • i18n/locales/hu-HU.json
  • i18n/locales/es.json
  • i18n/locales/cs-CZ.json
  • i18n/locales/pl-PL.json
  • i18n/locales/de-DE.json
  • i18n/locales/tr-TR.json
  • i18n/locales/ru-RU.json
  • i18n/locales/bg-BG.json
  • i18n/locales/zh-CN.json
  • i18n/locales/id-ID.json
  • i18n/locales/uk-UA.json
  • app/utils/download-anomalies.data.ts
✅ Files skipped from review due to trivial changes (1)
  • i18n/locales/az-AZ.json

Comment on lines +48 to +49
hampelWindow: number
hampelThreshold: number
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find . -type f -name "useSettings.ts" | head -5

Repository: npmx-dev/npmx.dev

Length of output: 93


🌐 Web query:

VueUse useLocalStorage mergeDefaults shallow merge documentation

💡 Result:

In VueUse, useLocalStorage() uses the same options as useStorage(), and the mergeDefaults behavior is documented on the useStorage page. (vueuse.org)

  • mergeDefaults: true merges the default value with the value read from storage, and for objects it does a shallow merge (top-level keys only). (vueuse.org)
  • For a deep merge, pass a custom function to mergeDefaults (the docs explicitly show this pattern). (vueuse.org)

Example (custom merge = deep merge):

import { useLocalStorage } from '@vueuse/core'
// import deepMerge from '...'

const state = useLocalStorage('key', { a: { b: 1 } }, {
  mergeDefaults: (storageValue, defaults) => deepMerge(defaults, storageValue),
})

Citations:


🏁 Script executed:

cat -n ./app/composables/useSettings.ts

Repository: npmx-dev/npmx.dev

Length of output: 7350


🏁 Script executed:

# Search for migration or normalisation logic in settings/preferences
rg -i "migrat|normaliz|hydrat" --type ts --type tsx -A 3 app/composables/ 2>/dev/null | head -50

Repository: npmx-dev/npmx.dev

Length of output: 43


🏁 Script executed:

# Check usePreferencesProvider.ts for migration patterns
find . -type f -name "*usePreferences*" -o -name "*preferences*" | grep -E "\.(ts|tsx)$"

Repository: npmx-dev/npmx.dev

Length of output: 236


🏁 Script executed:

head -150 ./app/composables/usePreferencesProvider.ts

Repository: npmx-dev/npmx.dev

Length of output: 2609


🏁 Script executed:

# Search for any settings normalisation, migration, or post-hydration logic
rg "chartFilter|hampel" --type ts --type tsx -B 3 -A 3 app/ | head -100

Repository: npmx-dev/npmx.dev

Length of output: 88


🏁 Script executed:

# Check if there's any middleware or plugin that normalises settings
rg "DEFAULT_SETTINGS|STORAGE_KEY|npmx-settings" --type ts --type tsx -B 2 -A 2 app/

Repository: npmx-dev/npmx.dev

Length of output: 88


🏁 Script executed:

# Search for chartFilter and hampel references
rg "chartFilter|hampel" --type ts -B 3 -A 3 app/

Repository: npmx-dev/npmx.dev

Length of output: 1232


🏁 Script executed:

# Search for settings normalisation anywhere
rg "DEFAULT_SETTINGS|STORAGE_KEY.*npmx-settings" --type ts -B 2 -A 2 app/

Repository: npmx-dev/npmx.dev

Length of output: 885


Deep-merge nested chartFilter settings on hydration.

VueUse's useLocalStorage with mergeDefaults: true only performs a shallow merge at the top level. Returning users with an existing stored chartFilter object will not receive the new hampelWindow and hampelThreshold fields—the entire default chartFilter is replaced, leaving the new settings unset until explicitly written. VueUse documentation recommends a custom merge function for nested defaults.

Suggested fix
   if (!settingsRef) {
     settingsRef = useLocalStorage<AppSettings>(STORAGE_KEY, DEFAULT_SETTINGS, {
-      mergeDefaults: true,
+      mergeDefaults: (storageValue, defaults) => ({
+        ...defaults,
+        ...storageValue,
+        connector: { ...defaults.connector, ...(storageValue.connector ?? {}) },
+        sidebar: { ...defaults.sidebar, ...(storageValue.sidebar ?? {}) },
+        chartFilter: {
+          ...defaults.chartFilter,
+          ...(storageValue.chartFilter ?? {}),
+        },
+      }),
     })
   }

Comment on lines +48 to +54
// Only evaluate points that have a full symmetric window.
// Boundary points lack enough context on one side, making them
// prone to false positives (e.g., a "great start" at the end of a series).
for (let i = halfWindow; i < values.length - halfWindow; i++) {
const start = i - halfWindow
const end = i + halfWindow
const window = values.slice(start, end + 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Boundary spikes still leak back in through date-range trimming.

This loop never scores the first or last halfWindow buckets. Because Package/TrendsChart.vue applies Hampel after fetching the user-selected range, any anomaly that becomes a boundary point after cropping is guaranteed to survive, so narrowing the chart around a spike can reintroduce the very point the default view removed. Please run the filter on padded history and trim afterwards, and add a regression for a cropped-range edge spike.

@jycouet
Copy link
Contributor

jycouet commented Mar 12, 2026

@matthewp I pulled your branch, and see the 2 controls

And with http://127.0.0.1:3000/package/vite?modal=chart&start=2025-03-13&end=2026-03-11 I get:

image

I tried also, removing my local storage to start from "fresh", but I still see the big spike.
You have the same on your end? I'm missing something?

I didn't look at the code at all, let me know how you want to proceed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants